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We propose a new theoretical method for the calculation of the interaction energy 
between macromolecular systems at large distances. The method provides a linear 
scaling of the computing time with the system size and is considered as an alternative 
to the well known fast multipole method. Its efficiency, accuracy and applicability 
to macromolecular systems is analyzed and discussed in detail. 



I. INTRODUCTION 



In recent years, there has been much progress in simulating the structure and dynamics 
of large molecules at the atomic level, which may include up to thousands and millions of 
atoms 1, [2I, 0]. For example, amorphous polymers may have segments each with 10000 
atoms 4j which associate to form partially crystalline lamellae, random coil regions, and 
interfaces between these regions, each of which may contribute with special mechanical and 
chemical properties to the system. 

With increasing computer powers nowadays it became possible to study molecular systems 
of enormous sizes which were not imaginable just several years ago. For example in Q a 
molecular dynamics simulations of the complete satellite tobacco mosaic virus was performed 
which includes up to 1 million of atoms. In that paper the stability of the whole virion and 
of the RNA core alone were demonstrated, and a pronounced instability was shown for the 
capsid without the RNA. 

The study of structure and dynamics of macromolecules often implies the calculation of 
the potential energy surface for the system. The potential energy surface of a macromolecule 
carries a lot of useful information about the system. For example from the potential energy 
landscape it is possible to estimate the characteristic times for the conformational changes 5|, 
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6|, |7j and for fragmentation |8j . The potential energy surface of a macromolecular system can 
)e used for studying the thermodynamical processes in the system such as phase transitions 
ij. In proteins, the potential energy surface is related to one of the most intriguing problems 
of protein physics: protein folding {9], [k], H, I2I, l^j. The rate constants for complex 
biochemical reactions can also be established from the analysis of the potential energy surface 

The calculation of the potential energy surface and molecular dynamics simulations often 
implies the evaluation of pairwise interactions. The direct method for evaluating these 
potentials is proportional to ~ iV 2 , where N is the number of particles in the system. 
This places a severe restraint on the treatable size of the system. During the last two 
decades many different methods have been sug gest ed which provide a linear dependence of 



the computational cost with respect to N 
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algorithm of this kind is the fast multipole method (FMM) 
The critical size of the system at which this method becomes computationally faster than the 
exact method is accuracy dependent and is very sensitive to the slope in the N dependence 



of the computational cost. In refs. 
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critical sizes ranging from N ~ 300 to 



N ~ 30000 have been reported. Many discrepancies of the estimates in the critical size arise 
from differences in the effort of optimizing the algorithm and the underlying code. However, 
it is also important to optimize the methods themselves with respect to the required accuracy. 

The FMM is based on the systematic organization of multipole representations of a local 
charge distribution, so that each particle interacts with local expansions of the potential. 
Original FMM was introduced in Q by Greengard and Rokh.in Later, Greengard's 
method has been implemented in various forms. Schmidt and Lee [20J have produced a 
version based upon the spherical multipoles for both periodic and nonperiodi c sy stems. 
Zhou and Johnson have implemented the FMM for use on parallel computers 261 ]. while 



Board et al have reported both serial and parallel versions of the FMM 5 



Ding et al introduced a version of the FMM that relies upon Cartesian rather than spher- 



ical multipoles 



18| . which they applied to very large scale molecular dynamics calculations. 



Additionally they modified Greengard's definition of the nearest neighbors to increase the 
proportion of interactions evaluated via local expansions. Shimada et al also developed a 
cartesian based FMM program [2^] , primarily to treat periodic systems described by molec- 
ular mechanics potentials. In both cases only low order multipoles were employed, since 



high accuracy was not sought. 

In the present paper we suggest a new method for calculating the interaction energy 
between macromolecules. Our method also provides a linear scaling of the computational 
costs with the size of the system and is based on the multipole expansion of the potential. 
However, the underlying ideas are quite different from the FMM. 

Assuming that atoms from different macromolecules interact via a pairwise Coulomb 
potential, we expand the potential around the centers of the molecules and build a two 
center multipole expansion using bipolar harmonics algebra. Finally, we obtain a general 
expression which can be used for calculating the energy and forces between the fragments. 
This approach is different from the one used in the FMM, where the so-called translational 
operators were used to expand the potential around a shifted center. Note that the final 



expression, which we suggest in our theory was not discussed be 
expressions were discussed since the earlier 50's (see e.g. [28 



ore wit 



lin the FMM. Similar 
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311]). In these papers 



the two center multipole expansion was considered as a new form of Coulomb potential 
expansion, but the expansion was never applied to the study of macromolecular systems. 

We consider the interaction of macromolecules via Coulomb potential since this is the 
only long-range interaction in macromolecules, which is important for the description of 
the potential energy surface at large distances. Other interaction terms in macromolecular 
systems are of the short-range type and become important when macromolecules get close 
to each other Q|. At large distances these terms can be neglected. 

In the present paper we show that the method based on the two center multipole ex- 
pansion can be used for computing the interaction energy between complex macromolecular 
systems. In section |TT] we present the formalism which lies behind the two center multipole 
expansion method. In subsection IIII Al we analyze the behavior of the computation cost of 
this method and establish the critical sizes of the system, when the two center multipole 
expansion method demands less computer time than the exact energy calculation approach. 
In subsection IIII Bl we compare the results of our calculation with the results obtained within 
the framework of the FMM. In section HVl we discuss the accuracy of the two center multipole 
expansion method. 



II. TWO CENTER MULTIPOLE EXPANSION METHOD 



In this section we present the formalism, which underlies the two center multipole expan- 
sion method, which will be further referred to as the TCM method. 

Let us consider two multi atomic systems, which we will denote as A and B. The pairwise 
Coulomb interaction energy of those systems can be written as follows: 

N A N B N A N B 

^^iRf-RAl ^^IR o + rf-ifl' [> 
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where Na and Nb are the total number of atoms in systems A and B respectively, qi and 
qj are the charges of atoms i and j from the system A and B respectively, Ro is the vector 
interconnecting the center of system A with the center of system B, rf and r? are the 
vectors describing the position of charges i and j with respect to the centers of the system 
A and B respectively. The centers of both systems can be any suitable points of each of the 
molecules. It is natural to define them as the centers of mass of the corresponding systems, 
but in some cases another choice might be more convenient (see for example Q], where we 
have applied the TCM method for studying fragmentation of alanine dipeptide). 

Expression (CQ) can be expanded into a series of spherical harmonics. The expansion 
depends on the vectors Ro, rf- and r?. In the present paper we consider the case when 

|R |>|rf| + |rf| (2) 

holds for all i and j. This particular case is important, because it describes well separated 
charge distributions, and can be used for modeling the interaction between complex objects 



at large distances. In this case the expansion of (CE]) reads as 321 ]: 
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According to [32| the function 



can be expanded into series of bipolar harmonics: 
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where a bipolar harmonic is defined as follows: 
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Here Cf^, are the Clebsch-Gordan coefficients, which can be transformed to the 3j- 
symbol notation as follows: 
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Using equations AID,® an d © we can rewrite expansion fl3j) as follows: 
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The multipole moments of systems A and B are defined as follows: 
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Summing equation (J7|) over i and j, and accounting only for the first L max multipoles in 
both systems, we obtain: 
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This expression describes the electrostatic energy of the system in terms of a two center 
multipole expansion. Note, that this expansion is only valid when the condition R > rf+r^ 
holds for all % and j, otherwise more sophisticated expansions have to be considered, which 
is beyond the scope of the present paper. 

Summation in equation ([9]) is performed over Zi,Z 2 £ [O...L max ]; mi £ [— Z1...Z1]; m 2 £ 
[— h-'-h], an d the condition M = m\ + ni2 holds. L max is the principal multipole number, 
which determines the number of multipoles in the expansion. 



III. COMPUTATIONAL EFFICIENCY 



A. Comparison with direct Coulomb interaction method 

In this section we discuss the computational efficiency of the TCM method. For this 
purpose we have analyzed the time required for computing the Coulomb interaction energy 
between two systems of charges and the time required for the energy calculation within the 
framework of the TCM method for different system sizes, and for different values of the 
principal multipole number. 

For the study of the computational efficiency of the TCM method we have considered 
the interaction between two systems (we denote them as A and B) of randomly distributed 
charges, for which the condition eq. §Z§ holds. The charges in both systems were randomly 

1/3 1/3 

distributed within the spheres of radii Ra = 1.0-N A and Rb = 1.0- N B respectively and the 
distance between the centers of mass of the two systems was chosen as Rq = 3/2(Ra + Rb)- 
The computational time needed for the energy calculations is proportional to the number 
of operations required. Thus, the time needed for the Coulomb energy calculation (CE 
calculation) can be estimated as: 



TCoul = OCCouiNaNb ~ iV 2 



(10) 



where acoui is a constant depending on the computer processor power and on the efficiency 
of the code, Na ~ Nb ~ N. From equation ffTUl) it follows that the computational cost of 
the CE calculation grows proportionally to the second power of the system size. 

For large systems the TCM method becomes more efficient because it provides a linear 
scaling with the system size. The time needed for the energy calculation reads as follows: 



where the first term, f3N, corresponds to the computer time needed for allocating arrays in 
memory and tabulating the computationally expensive functions like cos($) and exp(zm$). 
Ti jTn is the time needed for evaluation of the spherical harmonic at given / and m, and a mu it 
is a numerical coefficient, which depends on the processor power and on the efficiency of the 
code. In general it is different from acoui- 

In Fig. [T]we present the dependencies of the computer time needed for the CE calculation 
(squares) and for the computation of energy within the TCM method for different values of 
the principal multipole number as a function of system size. This data was obtained on a 
1.8 GHz 64-bit AMD Opteron-244 computer. 

From Fig. [1] it is clear that the time needed for the CE calculation has a prominent 
parabolic trend that is consistent with the analytical expression (11 01) . The fitting expression 
which describes this dependance is given in the first row of Tab. HI At large N the N 2 term 
becomes dominant and the other two terms can be neglected. Thus, occoui ~ 4.46 • 10 -8 



The fitting expressions which describe the time needed for the energy computations within 
the TCM method at different values of the principal multipole number are given in Tab. U 
rows 2-10. These expressions were obtained by fitting the data shown in Fig. [TJ Note the 
linear dependence on N. The numerical coefficient in all expressions correspond to the factor 
a mu itL max {l + L max ) 3 in equation (fTTI) . The fitting expressions in Tab. [Qwere obtained by 
fitting of data obtained for systems with large number of particles (see Fig. [TJ. Therefore 
these expressions are applicable when N ^> 1. 

From equations presented in Tab. [I] it is possible to determine the critical system sizes at 
which the TCM method becomes less computer time demanding then the CE calculation. 




(11) 



h=0 h=0 mi——li m2 =—h 
& mult L max (1 L max ) ^ t 



(sec). 




N 

FIG. 1: Time needed for energy calculation as a function of the system size. 

The critical system sizes calculated for different principal multipole numbers are shown in 
the third column of Tab. [D These sizes correspond to the intersection points of the parabola 
describing the time needed for the CE calculation with the straight lines describing the 
computational time needed for the TCM method. In Fig. [1] one can see six intersection 
points for L max = 2 — 7. 

From equation it follows that computation time of the energy within the framework 
of the TCM method grows as the power of 4 with increasing L max . To stress this fact, 
in Fig. [2] we present the dependencies of the computation time obtained within the TCM 
method at different system sizes as a function of principal multipole number. All curves 
shown in Fig. [2] can be perfectly fitted by the analytical expression fill I) . In the inset to 
Fig. [21 we plot the dependence of the fitting coefficient a mu i t as a function of the system 
size. From this plot it is seen that a mu i t varies only slightly for all system sizes considered, 
being equal to (1.982 ± 0.015) ■ 10~ 7 (sec). 

Thus, the expression for the time needed for the energy calculation within the framework 



TABLE I: Fitting expressions for the computational time needed for the CE calculation and for the 
energy computation within the TCM method at different values of the principal multipole number, 
Lmax (second column). System sizes, for which the Coulomb energy calculation becomes more 
computer time demanding at a given value of L max are shown in the third column. 



Lmax 


t(N) (sec.) 


N m ax 


Coulomb 


0.11736 - 0.0002iV + 4.6768 • 10" 8 iV 2 


- 


2 


-0.01986 + 3.0 • 10~ 5 iV 


4223 


3 


-0.03159 + 5.0 • 10~ 5 iV 


4662 


4 


-0.04714 + 1.0 • 10~ 4 iV 


5809 


5 


-0.16054 + 2.1 • 10~ 4 iV 


8026 


6 


-0.14710 + 3.7 • 10~ 4 iV 


11704 


7 


-0.59675 + 7.4 • 10" 4 iV 


19308 


8 


-0.35383 + 10.9 • 10" 4 iV 


27212 


9 


-1.15856 + 1.9 • 10~ 3 iV 


44286 


10 


-0.83688 + 2.71 • 10~ 3 iV 


61892 



of the TCM method reads as: 

T mu it(L max ) « 1.98 ■ 10~ 7 L max {\ + L max ) 3 N. (12) 

Note, that a mu i t = 1.98 ■ 1CT 7 (sec) is larger than acoui ~ 4.46 ■ 1CT 8 (sec), since in one turn 
of the TCM method more algebraic operations have to be done, than in one turn of the CE 
calculation. 

From the analysis performed in this section it is clear that the TCM method can give a 
significant gain in the computation time. However, at larger principal multipole numbers 
(L max = 8, 9, 10) this method can compete with the CE calculation only at system sizes 
greater than 27000-61000 atoms. The accounting for higher multipoles is necessary if the 
distance between two interacting systems becomes comparable to the size of the systems. In 
the next section we discuss in detail the accuracy of the TCM method and identify situations 
in which higher multipoles should be accounted for. 




L 
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FIG. 2: Time needed for the calculation of energy of the systems of different sizes computed within 
the framework of the TCM method as a function of the principal multipole number L max . In the 
inset we plot ot mu it as a function of the system size. 

B. Comparison with the fast multipole method 



The fast multipole method (FMM) 2l|, |22|, |23(] is a well known method for calculating 



the electrostatic energy in a multiparticle system, which provides a linear scaling of the 
computing time with the system size. In order to stress the computational efficiency of the 
TCM method in this section we compare the time required for the energy calculation within 
the framework of the FMM and using the TCM method. 

To perform such a comparison we used an adaptive FMM library, which has been im- 
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We have generated two 



plemented for the Coulomb potential in three dimensions 
random charge distributions of different size and calculated the interaction energies between 
them as well as the required computation time using the FMM and the TCM methods. As 
in the previous section the charges in both systems were randomly distributed within the 
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FIG. 3: Time needed for the calculation of the interaction energy between two systems as a function 
the total number of particles calculated within the framework of the TCM method (triangles) and 
within the framework of the FMM (squares). In the upper and lower insets we plot the relative 
error of the FMM and of the TCM methods as a function of the system size respectively. 

1/3 1/3 

spheres of radii Ra = 1.0 ■ iV A and Rb = 1.0 • N B respectively and the distance between 
the center of mass of the two systems was chosen as Rq = 3/2 (R^ + Rb)- 

In Fig.[3]we present the comparison of the computer time needed for the FMM calculation 
(squares) and for the computation of energy within the TCM method (triangles) as a function 
of system size. These data were obtained on an Intel(R) Xeon(TM) CPU 2.40GHz computer. 
In the upper and lower insets of Fig. [3] we show the relative error of the FMM and of the 
TCM methods as a function of the system size respectively, which is defined as follows: 



(13) 



Here method indicates the FMM or the TCM methods. For comparing the efficiency of the 
two methods we have considered different charge distributions within the size range of 100 
to 10000 particles. Each point in Fig. [3] corresponds to a particular charge distribution. 
For each system size ten different charge distributions were used. The time of the FMM 
calculation depends on the charge distribution, as is clearly seen in Fig. [3j Note that for a 
given system size the calculation time of the FMM can change by more than a factor of 5, 
depending on the charge distribution (see points for N = 10000 in Fig. [3]). 

For all system sizes FMM requires some minimal computer time for calculating the energy 
of the system, which increases with the growth of system size (see Fig. [3]). The comparison 
of the minimal FMM computation time with the computation time required for the TCM 
method shows that the TCM method appears to be significantly faster than the FMM. For 
N = 10000 FMM requires at least 2.15 seconds to compute the energy, while TCM method 
requires 0.53 seconds, being approximately 4 times faster. 

The results of the TCM method calculation shown in Fig. [3j were obtained for L max = 2. 
The analysis of relative errors presented in the inset to Fig. [3] shows that with this principal 
multipole number it is possible to calculate the energy between two systems with an error 
of less than 1 % for almost arbitrary charge distribution. Note that for the same charge 
distributions the error of the FMM is much more, being about 5 % in almost all of the 
considered systems. This allows us to conclude that the TCM method is more efficient and 
more accurate than the classical FMM. 

It is important to mention that in the traditional implementation, FMM calculates the 
total electrostatic energy of the system while TCM method was developed for studying 
interaction energy between system fragments. It is possible to modify the FMM to study only 
interaction energies between different parts of the system. However, the computation cost of 
the modified FMM is expected to be higher than of the TCM method. This happens because, 
within the framework of the modified FMM method, the field created by one fragment of 
the system should be expanded in the multipole series and the interactions of the resulting 
multipole moments with the charges from another fragment should be calculated. Thus the 
computation cost of this method will be proportional to ■ Nr, where Na and Nb are the 
number of particles in two fragments, while the TCM method is proportional to + Nb- 
The computation cost of the modified version of the FMM depends quadratically on the 
size of the system, because in this method the interacting fragments should be considered as 



two independent cells, while traditional FMM uses a hierarchical subdivision of the whole 
system into cells to achieve linear scaling. 

So far we have considered only the interaction between two multi particle systems in 
vacuo, and demonstrated the efficiency of the TCM method in this case, although the TCM 
method can also be applied to the larger number of interacting systems. The study of 
structure and dynamics of biomolecular systems consisting of several components (i.e an 
ensemble of proteins, DNA, macromolecules in solution) is a separate topic, which is beyond 
the scope of this paper and deserves a separate investigation. 



IV. ACCURACY OF THE TCM METHOD. POTENTIAL ENERGY SURFACE 
FOR PORCINE PANCREATIC TRYPSIN/ SOYBEAN TRYPSIN INHIBITOR 

COMPLEX. 

We have calculated the interaction energy between two proteins within the framework of 
the TCM method and compared it with the exact Coulomb energy value. On the basis of 
this comparison we have concluded about the accuracy of the TCM method. 

In the present paper we have studied the interaction energy between the porcine pan 



creatic trypsin and the soybean trypsin inhibitor proteins (Protein Data Bank (PDB) 36] 



entry 1AVW [37|]). Trypsins are digestive enzymes produced in the pancreas in the form 
of inactive trypsinogens. They are then secreted into the small intestine, where they are 
activated by another enzyme into trypsins. The resulting trypsins themselves activate more 
trypsinogens (autocatalysis). Members of the trypsin family cleave proteins at the carboxyl 
side (or " C-terminus" ) of the amino acids lysine and arginine. Porcine pancreatic trypsin is a 
archetypal example. Its natural non-covalent inhibitor (porcine pancreatic trypsin inhibitor) 
inhibits the enzyme's activity in the pancreas, protecting it from self-digestion. 

Trypsin is also inhibited non-covalently by the soybean trypsin inhibitor from the soya 
bean plant, although this inhibitor is unrelated to the porcine pancreatic trypsin inhibitor 
family of inhibitors. Although the biological function of the soybean trypsin inhibitor is 
mostly unknown it is assumed to help defend the plant from insect attack by interfering 
with the insect digestive system. 

The structure of both proteins is shown in Fig. HI The coordinate frame used for our 
computations is marked in the figure. This coordinate frame is consistent with the standard 



coordinate frame used in the PDB. 




soybean trypsin inhibitor 



porcine pancreatic trypsin 



FIG. 4: Structure of the porcine pancreatic trypsin and soybean trypsin inhibitor with the coor- 
dinate frame used for the energy computation. Figure has been rendered with help of the VMD 
visualization package [381 ] 



We use this particular example as a model system in order to demonstrate the possible 
use of the TCM method. Therefore environmental effects are omitted and we consider only 
the protein-protein interaction in vacuo. The porcine pancreatic trypsin and the soybean 
trypsin inhibitor include 223 and 177 amino acids respectively. Both proteins include 5847 
atoms. Thus for such system size the TCM method is faster than the CE calculation if 
Lmax < 4 (see Tab. HJ). 

We have calculated the interaction energy between the porcine pancreatic trypsin and 
soybean trypsin inhibitor as a function of distance between the centers of masses of the 
fragments, Rq, and the angle G, which is determined as the angle between the x-axis and 
the vector Rq (see Fig. Hj). We have assumed that the porcine pancreatic trypsin is fixed 
in space at the center of the coordinate frame and have restricted Ro to the (xy)-plane. Of 
course, the two degrees of freedom considered are not sufficient for a complete description 



of the mutual interaction between the two systems. For this purpose at least six degrees 
of freedom are needed. However for our example of the energy calculation of the porcine 
pancreatic trypsin/soybean trypsin inhibitor complex within the framework of the TCM 
method the two coordinates Ro and are sufficient. 

The interaction energy of the porcine pancreatic trypsin with the soybean trypsin in- 
hibitor as the function of Rq and calculated within the framework of the TCM method is 
shown in Fig. El The Coulomb interaction energy between the two proteins is shown in the 
top-left plot. In jg] it has been shown that the interaction energy between two well sepa- 
rated biological fragments arises mainly due to the Coulomb forces. In the present paper we 
consider Rq G [58, 100] A and G [0,360]°, at which condition (j2J) holds and both proteins 
can be considered as separated. This means that the potential energy surface shown in the 
top- left plot of Fig. El describes the interaction energy between the porcine pancreatic trypsin 
and the soybean trypsin inhibitor on the level of accuracy of 90 % at least. 

The top-left plot of Fig. El shows that one can select several characteristic regions on 
the potential energy surface marked with numbers 1-4. The corresponding configurations 
(states) of the system are shown in Fig. El The potential energy surface is determined by the 
Coulomb interactions between atoms, thus at large distances it raises and asymptotically 
approaches to zero. State 1 has the maximum energy within the considered part of the 
potential energy surface because this state corresponds to the largest contact separation 
distance between porcine pancreatic trypsin and the soybean trypsin inhibitor being equal 
to 54.8 A. 

At smaller distances the potential energy decreases due to the attractive forces acting 
between the two proteins. State 2 corresponds to the minimum on the potential energy sur- 
face. It arises because a positively charged polar arginine (R125) from the porcine pancreatic 
trypsin approaches the negatively charged site of the soybean trypsin inhibitor, which in- 
cludes negatively charged polar amino acids glutamic acid (E549) and aspartic acid (D551) 
(see state 2 in Fig. [6]). The strong attraction between the amino acids leads to the formation 
of a potential well on the potential energy surface. This observation is essential for dynamics 
of the attachment process of two proteins, because it establishes the most probable angle at 
which the proteins stick in the (xy)-plane of the considered coordinate frame (0 = 192°). 

States 3 and 4 correspond to the saddle points on the potential energy surface and have 
energies higher than state 2. They are formed because at these configurations two positively 




FIG. 5: The interaction energies of the porcine pancreatic trypsin with the soybean trypsin inhibitor 
calculated as the function of Rq and O (see Fig. H]) within the framework of the TCM method at 
different values of the principal multipole number, L max . The principal multipole number is given 
above the corresponding image. The result of the CE calculation is shown in the top left plot. 

charged polar amino acids from the two proteins become closer in space providing a source 
of a local repulsive force. In state 3 these amino acids are lysines (K145 and K665) (see 
state 3 in Fig. [6]), and in state 4 these are arginines (R62 and R563)(see state 4 in Fig. [6]). 
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FIG. 6: Relative orientations of the porcine pancreatic trypsin and the soybean trypsin inhibitor, 
corresponding to the selected points on the potential energy surface presented in Fig. [5j Below 
each image we give the corresponding values of Ro and 0. Some important amino acids are marked 
according to their PDB id. Figure prepared with help of the VMD visualization package 38] 



In the top-right plot of Fig. [5] we present the potential energy surface obtained within 
the framework of the TCM method with L max = 2, i.e. with accounting for up to the 
quadrupole-quadrupole interaction term in the multipole expansion ([9]). From the figure it 
is seen that the TCM method describes correctly the major features of the potential energy 
landscape (i.e. the position of the minimum and maximum as well as their relative energies). 
However, the minor details of the landscape, such as the saddle points 3 and 4 (see top-left 
plot of Fig. E]) are missed. 

The relative error of the TCM method can be defined as follows: 



V( L max)(Ro, 0) 



\u coul (R ,e)-u^(R ,e)\ 



■ 100%, 



(14) 
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where U cou i(Ro, 0) and U^i t x (Ro, 0) are the Coulomb energy and the energy calculated at 
given values of Ro and G within the TCM method respectively. In the top-left plot of Fig. [7] 



we present the relative error calculated according to (j!4p for L max = 2. From this plot it 
is clear that significant deviation from the exact result arise at ~ 50 — 60°, 140 — 150°, 
245°, 300 - 310° and 350°. The discrepancy at ~ 50 - 60°, ~ 300 - 310° and ~ 350° 
arises because the saddle points 3 and 4, can not be described within the framework of TCM 
method with L max = 2. The discrepancy at ~ 140 — 150° and ~ 245° is due to the 
error in the calculation of the slopes of minimum 2 at Rq = 58 A and = 198°. 




R(A) 

FIG. 7: Relative error of the interaction energies of the porcine pancreatic trypsin with the soybean 
trypsin inhibitor calculated as the function of Rq and G within the framework of the TCM method 
at different values of the principal multipole number, L max . The principal multipole number is 
given above the corresponding image. 

It is worth noting that the relative error of the TCM method with L max = 2 is less than 10 
%. With increasing distance between the proteins, the relative error decreases, and becomes 
less than 5 % at Rq > 72 A and less than 3 % at Rq > 86 A. This means that already at 
Lmax = 2 the TCM method reproduces with a reasonable accuracy the essential features of 
the potential energy landscape. This observation is very important, because TCM method 
with L max = 2 requires less computer time then the CE calculation already at N = 4223 



(see Tab. [T|). Thus, the TCM method can be used for the identification of major minima 
and maxima on the potential energy surface of macromolecules and modeling dynamics of 
complex molecular systems. 

Accounting for higher multipoles in the multipole expansion ([9]) leads to a more accurate 
calculation of the potential energy surface. In the second row of Fig. [5] we present the 
potential energy surfaces obtained within the framework of the TCM method with L max = 4 
and 6. From these plots it is seen that all minor details of the Coulomb potential energy 
surface, such as the saddle points 3 and 4 are reproduced correctly. Figure [7] shows that the 
TCM method with L max = 4 gives the maximal relative error of about 5 % at Rq = 58 A 
and O = 75°, in the vicinity of the saddle point 3. The relative errors in the vicinity of the 
saddle point 4 and minimum 2 are equal to 4 % and 1 % respectively. The error becomes 
less then 1 % for all values of angle at Rq > 70 A. For L max = 6, the largest relative error 
is equal to 1.5 % at R = 58 A and 6 = 340° (saddle point 4), becoming less then 1 % at 
R > 61 A. 

By accounting for the higher multipoles in the multipole expansion (J9j) one can increase 
the accuracy of the method. Thus, with L max = 8 and 10 it is possible to calculate the 
potential energy surface with the error less then 1 % (see bottom row in Fig. [5] and Fig. [7j). 
Although the time needed for computing the potential energy surfaces with L max = 8 and 
10 is larger than the time needed for computing the Coulomb energy directly (see Tab. [T]), 
we present these surfaces in order to stress the convergence of the TCM method. 

V. CONCLUSION 

In the present paper we have proposed a new method for the calculation of the Coulomb 
interaction energy between pairs of macromolecular objects. The suggested method provides 
a linear scaling of the computational costs with the size of the system and is based on the 
two center multipole expansion of the potential. Analyzing the dependence of the required 
computer time on the system size, we have established the critical sizes at which our method 
becomes more efficient than the direct calculation of the Coulomb energy. 

The comparison of efficiency of the TCM method with the efficiency of FMM allows us to 
conclude that the TCM method has proved to be faster and more accurate than the classical 
FMM. 



The method based on the two center multipole expansion can be used for the efficient 
computation of the interaction energy between complex macromolecular systems. To de- 
termine that we have considered the interaction between two proteins: porcine pancreatic 
trypsin and the soybean trypsin inhibitor. The accuracy of the method has been discussed 
in detail. It has been shown that accounting of only four multipoles in both proteins gives 
an error in the interaction energy less than 5 %. 

The TCM method is especially useful for studying dynamics of rigid molecules, but it can 
also be adopted for studying dynamics of flexible molecules. In this work we have developed 
a method for the efficient calculation of the interaction energy between pairs of large multi 
particle systems, e.g. macromolecules, being in vacuo. The investigation of biomolecular 
systems consisting of several components (i.e complexes of proteins, DNA, macromolecules 
in solution) and the extension of the TCM method for these cases deserves a separate 
investigation. If a system of interest consists of several interacting molecules being placed in 
a solution, one can use the TCM method to describe the interaction between the molecules 
and then to take account of the solution as implicit solvent. This can be achieved using for 
example the formalism of the Poisson-Bolzmann 3J, , similar to how it was implemented 

nn 

for the description of the antigen- antibody binding/unbinding process [141 . |15| . The other 
possibility is to split the whole system into boxes and account for the solvent explicitly by 
calculating the interactions between the boxes and the molecules of interest. This can be 
achieved by using the TCM method or a combination of the FMM and the TCM methods. 
In this case the FMM can be used for the calculation of the resulting effective multipole 
moment of the solvent, while the TCM method is much better suitable for the description of 
the macromolecules energetics and dynamics. Note that all of the suggested methodologies 
provide linear scaling of the computation time on the system size. 

The results of this work can be utilized for the description of complex molecular systems 
such as viruses, DNA, protein complexes, etc and their dynamics. Many dynamical features 
and phenomena of these systems are caused by the electrostatic interaction between their 
various fragments and thus the use of the two center multipole expansion method should 
give a significant gain in their computation costs. 
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